3 research outputs found

    The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages

    Full text link
    Gender biases in language generation systems are challenging to mitigate. One possible source for these biases is gender representation disparities in the training and evaluation data. Despite recent progress in documenting this problem and many attempts at mitigating it, we still lack shared methodology and tooling to report gender representation in large datasets. Such quantitative reporting will enable further mitigation, e.g., via data augmentation. This paper describes the Gender-GAP Pipeline (for Gender-Aware Polyglot Pipeline), an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages. The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text. We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation. Having unbalanced datasets may indirectly optimize our systems towards outperforming one gender over the others. We suggest introducing our gender quantification pipeline in current datasets and, ideally, modifying them toward a balanced representation.Comment: 15 page

    SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

    Full text link
    What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communicatio

    The Influence of New Novel on Persian Literature. A Survey on the issue of newness in novel in Houshang Golshiri’s works

    No full text
    Cette thèse aborde, sous un angle comparatif, l’impact du Nouveau Roman français sur la littérature persane. Nous circonscrirons notre étude à la période contemporaine, les années 1340-1350 (1960-1970), en nous penchant sur l'apport des médias comme Djong-e Esfahan, en 1344 (1965) et de la traduction, qui contribuent à transmettre les nouveautés littéraires aux nouveaux romanciers de la Perse. Cette étude se penche sur l’apparition du Nouveau Roman comme bouleversement des codes et conventions traditionnelles littéraires. Dorénavant, le refus du pastiche et de l'imitation encourage les nouveaux romanciers à développer, dans le champ romanesque, une nouvelle esthétique de l’écriture. Nous nous concentrerons principalement sur le parallèle entre l'écriture de Golshiri, et celle de Robbe-Grillet, dont le premier s'est relativement inspiré.A cet égard, nous considérons quelques pistes romanesques de ces pionniers du Nouveau Roman, comme support à notre étude comparative. Leur transformation de la lecture passive traditionnelle, en une activité plus dynamique que jamais, nous mène à évaluer le nouveau statut du lecteur contemporain, piégé dans l’aventure de l’écriture. Les œuvres littéraires étudiées dans cette thèse se situent à la croisée d’une étude esthétique sur la forme narrative du texte et la prédominance de l’écriture sur le sens. Nous analyserons les procédés de mise en valeur du langage utilisés, et les modifications subies par ces nouveaux apports techniques du texte littéraire. Grâce à l’étude des œuvres du corpus, nous analyserons, de manière comparative, les enjeux qui existent entre les notions de réel et de fictif, d’Ancien et de Moderne, de genre et de mouvement littéraire.This thesis aims to study, through a comparative approach, the impact of New French Novel on Persian literature. This study is circumscribed to a contemporary period, from 1340 until 1350 (1960-1970), and underscores the relationship between the Iranian media such as Djong-e Esfahan produced in 1344 (1965) and translation, which contributed to the transfer of the literary novelties to the new Persian novelists. This research will demonstrate the existing contrast between the traditional conventions of the literary genres and their subversion provoked by literary movements such as New Novel. From now on, the refusal of pastiche and mimesis encourages the new novelists to develop, in the field of fiction, a new aesthetic of writing. The major problematic of this study concentrates on drawing the parallels between the works of Golshiri and those of Robbe-Grillet, which was a real source of inspiration for the first one. In this regard, selected fictions of the pioneers of the New Novel are considered in order to support our comparative study. The dramatic change from a traditionally passive reading to an active reading leads us to assess the new status of the contemporary reader, tripped in the adventure of writing. The literary works studied in this thesis are situated in the crossroad of an aesthetic study on narrative form of the text and the superiority of writing over meaning. We will analyse the process of the development of language, the modifications it has undergone through the new technological relationship of the literary text. Thanks to the study of these works, we will analyse, from a comparative point of view, the existing challenges between the notions of real and fictive, of ancient and modern, of literary genre and literary movement
    corecore